An investigation of credit card default prediction in the imbalanced datasets

Alam, Talha Mahboob; Shaukat, Kamran; Hameed, Ibrahim A.; Luo, Suhuai; Sarwar, Muhammad Umer; Shabbir, Shakir; Li, Jiaming; Khushi, Matloob

Title: An investigation of credit card default prediction in the imbalanced datasets
Creator: Alam, Talha Mahboob; Shaukat, Kamran; Hameed, Ibrahim A.; Luo, Suhuai; Sarwar, Muhammad Umer; Shabbir, Shakir; Li, Jiaming; Khushi, Matloob
Relation: IEEE Access Vol. 8, p. 201173-201198
Publisher Link: http://dx.doi.org/10.1109/ACCESS.2020.3033784
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Resource Type: journal article
Date: 2020
Description: Financial threats are displaying a trend about the credit risk of commercial banks as the incredible improvement in the financial industry has arisen. In this way, one of the biggest threats faces by commercial banks is the risk prediction of credit clients. Recent studies mostly focus on enhancing the classifier performance for credit card default prediction rather than an interpretable model. In classification problems, an imbalanced dataset is also crucial to improve the performance of the model because most of the cases lied in one class, and only a few examples are in other categories. Traditional statistical approaches are not suitable to deal with imbalanced data. In this study, a model is developed for credit default prediction by employing various credit-related datasets. There is often a significant difference between the minimum and maximum values in different features, so Min-Max normalization is used to scale the features within one range. Data level resampling techniques are employed to overcome the problem of the data imbalance. Various undersampling and oversampling methods are used to resolve the issue of class imbalance. Different machine learning models are also employed to obtain efficient results. We developed the hypothesis of whether developed models using different machine learning techniques are significantly the same or different and whether resampling techniques significantly improves the performance of the proposed models. One-way Analysis of Variance is a hypothesis-testing technique, used to test the significance of the results. The split method is utilized to validate the results in which data has split into training and test sets. The results on imbalanced datasets show the accuracy of 66.9% on Taiwan clients credit dataset, 70.7% on South German clients credit dataset, and 65% on Belgium clients credit dataset. Conversely, the results using our proposed methods significantly improve the accuracy of 89% on Taiwan clients credit dataset, 84.6% on South German clients credit dataset, and 87.1% on Belgium clients credit dataset. The results show that the performance of classifiers is better on the balanced dataset as compared to the imbalanced dataset. It is also observed that the performance of data oversampling techniques are better than undersampling techniques. Overall, the Gradient Boosted Decision Tree method performs better than other traditional machine learning classifiers. The Gradient Boosted Decision Tree method gives the best results while utilizing the K-means SMOTE oversampling method. Using one-way ANOVA, the null hypothesis was rejected by a p-value <; 0.001, hence confirming that the proposed model improved performance is statistical significance. The interpretable model is also deployed on the web to ease the different stakeholders. This model will help commercial banks, financial organizations, loan institutes, and other decision-makers to predict the loan defaulter earlier.
Subject: machine learning; imbalanced data; customer credit risk; credit card default model; interpretable model; gradient boosted decision tree
Identifier: http://hdl.handle.net/1959.13/1427484
Identifier: uon:38537
Identifier: ISSN:2169-3536
Rights: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
Language: eng
Full Text
Reviewed

Hits: 4612
Visitors: 6355
Downloads: 1926

		Thumbnail	File	Description	Size	Format
View Details Download			ATTACHMENT01	Author final version	5 MB	Adobe Acrobat PDF	View Details Download
View Details Download			ATTACHMENT02	Publisher version (open access)	5 MB	Adobe Acrobat PDF	View Details Download